{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[![Dataflowr](https://raw.githubusercontent.com/dataflowr/website/master/_assets/dataflowr_logo.png)](https://dataflowr.github.io/website/)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "8FM8CQHKcfhx"
   },
   "source": [
    "# Practical 2: Simple implementation of backprop\n",
    "\n",
    "[Video timestamp](https://youtu.be/Z6H3zakmn6E?t=2640)\n",
    "\n",
    "Here we implement a simple backpropagation algorithm with `numpy` for the following problem:\n",
    "\n",
    "We generate points $(x_t,y_t)$ where $y_t= \\exp(w^*x_t+b^*)$, i.e $y^*_t$ is obtained by applying a deterministic function to $x_t$ with parameters $w^*$ and $b^*$. Our goal is to recover the parameters $w^*$ and $b^*$ from the observations $(x_t,y_t)$.\n",
    "\n",
    "To do this, we use SGD to minimize $\\sum_t(y^t - \\exp(w x_t+b))^2$ with respect to $w$ and $b$.\n",
    "\n",
    "In these practicals, we will implement **Stochastic Gradient Descent with minibatches of size 1** and not Batched Gradient Descent as seen in lesson 2.\n",
    "\n",
    "The modification to the algorithm is as follows: we want to minimize the loss given by:\n",
    "$$\n",
    "loss = \\sum_t\\underbrace{\\left(\\exp(w x_t+b)-y_t \\right)^2}_{loss_t}.\n",
    "$$\n",
    "\n",
    "To minimize the loss we first compute the gradient of each $loss_t$:\n",
    "$\\frac{\\partial{loss_t}}{\\partial w}$, and $\\frac{\\partial{loss_t}}{\\partial b}$.\n",
    "\n",
    "For one epoch, **Stochastic Gradient Descent with minibatches of size 1** then updates the weigts and bias by running the following loop: \n",
    "\n",
    "for $t \\in \\{1,\\dots,30\\}$, \n",
    "\n",
    "\\begin{eqnarray*}\n",
    "w_{t+1}&=&w^1_{t}-\\alpha\\frac{\\partial{loss_t}}{\\partial w} \\\\\n",
    "b_{t+1}&=&b_{t}-\\alpha\\frac{\\partial{loss_t}}{\\partial b},\n",
    "\\end{eqnarray*}\n",
    "\n",
    "if $t = 30$, set $w_1=w_{31}$, and $b_1=b_{31}$.\n",
    "\n",
    "$\\alpha>0$ is called the learning rate.\n",
    "\n",
    "Then we run several epochs...\n",
    "\n",
    "We see that one epoch is much less costly than in the Batched Gradient Descent algorithm as you only need to have acces to one sample to make one update, whereas in the batched setting, you need to make a pass on the whole dataset to make an update. Of course, we are not computing the gradient here and the partial derivatives can be seen as noisy estimation of the gradient."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "z-62DGjicfh3"
   },
   "outputs": [],
   "source": [
    "# You do not need PyTorch for these practicals\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "SQlF4jhfcfiD"
   },
   "outputs": [],
   "source": [
    "w, b = 0.5, 2\n",
    "xx = np.arange(0,1,.01)\n",
    "yy = np.exp(w*xx+b)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "4OIvcmsqcfiL"
   },
   "outputs": [],
   "source": [
    "plt.plot(yy)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "siSZvSBBcfiW"
   },
   "source": [
    "Following what we just saw in the course, you need to implement each of the basic operations: `(.*w), (.+b), exp(.)` with a forward method, a backward method and a step method."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "Xfy5oCZBMnVj"
   },
   "source": [
    "If you are not familiar with [classes](https://docs.python.org/3/tutorial/classes.html) in Python, you should learn its basics as it will be required to follow the rest of the course.\n",
    "\n",
    "To help you, I have implemented the `(.+b)` operation as a Python class below:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "PeY2DlqNcfiZ"
   },
   "outputs": [],
   "source": [
    "class add_bias(object):\n",
    "    def __init__(self,b):\n",
    "        # initialize with a bias b\n",
    "        self.b = b\n",
    "        \n",
    "    def forward(self, x):\n",
    "        # return the result of adding the bias      \n",
    "        return x + self.b\n",
    "    \n",
    "    def backward(self,grad):\n",
    "        # save the gradient (to update the bias in the step method) and return the gradient backward\n",
    "        self.grad = grad\n",
    "        return grad\n",
    "    \n",
    "    def step(self, learning_rate):\n",
    "        # update the bias\n",
    "        self.b -= learning_rate*self.grad        "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "UBLMKY1nMnVm"
   },
   "source": [
    "Consider now a simpler problem where you have $z_t = x_t+b^*$ and your task is to estmiate $b^*$ by minimizing the loss $\\sum_t(x_t+b-z_t)^2$ as a function of $b$ with SGD. You can use the `add_bias` defined above as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "lZpxeNqQMnVn"
   },
   "outputs": [],
   "source": [
    "# first compute the z_t with a true bias of 5:\n",
    "zz = xx+5\n",
    "\n",
    "#start with an initial guess of 1 for the bias:\n",
    "My_add_bias = add_bias(1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "NLoIijKPMnVq"
   },
   "outputs": [],
   "source": [
    "j = 10\n",
    "# your prediction will be for each sample\n",
    "z_pred = My_add_bias.forward(xx[j])\n",
    "z_pred"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "-ppGMgc7MnVt"
   },
   "outputs": [],
   "source": [
    "# start with the gradient of the quadratic loss\n",
    "grad = 2*(z_pred-zz[j])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "lVIy0xeAMnVv"
   },
   "outputs": [],
   "source": [
    "# backpropagate the gradient to the parameter b\n",
    "My_add_bias.backward(grad)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "cJtZICCiMnVy"
   },
   "outputs": [],
   "source": [
    "# make an update of the bias\n",
    "My_add_bias.step(1e-2)\n",
    "My_add_bias.b"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "3ZWF5vaTMnV0"
   },
   "source": [
    "The code above corresponds to one SGD update.\n",
    "Below, I coded the training loop for SGD where the update on the parameter is done each time you see a sample: for each sample $j$, you compute the associated "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "_p1nC0dpMnV1"
   },
   "outputs": [],
   "source": [
    "My_add_bias = add_bias(1)\n",
    "estimated_b = [1]\n",
    "for i in range(500):\n",
    "    # take a random indice\n",
    "    j = np.random.randint(1, len(xx))\n",
    "    z_pred = My_add_bias.forward(xx[j])\n",
    "    grad = 2*(z_pred-zz[j])\n",
    "    _ = My_add_bias.backward(grad)\n",
    "    My_add_bias.step(1e-2)\n",
    "    estimated_b.append(My_add_bias.b)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "y3cekk-nMnV4"
   },
   "outputs": [],
   "source": [
    "plt.plot(estimated_b)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "EaqpsHK6MnV6"
   },
   "source": [
    "Although SGD is computing a noisy version of the gradient, we see that SGD converges to the right solution in this case.\n",
    "\n",
    "Now it is your turn!\n",
    "Following what we just saw in the course, you need to implement each of the basic operations: `(.*w), exp(.)` with a forward method, a backward method and a step method.\n",
    "\n",
    "![backprop3](https://dataflowr.github.io/notebooks/Module2/img/backprop3.png)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "58L4gjz1MnV7"
   },
   "outputs": [],
   "source": [
    "class multiplication_weight(object):\n",
    "    def __init__(self, w):\n",
    "        # initialize with a weight w\n",
    "        \n",
    "    def forward(self, x):\n",
    "        # return the result of multiplying by weight\n",
    "               \n",
    "    def backward(self,grad):\n",
    "        # save the gradient and return the gradient backward\n",
    "            \n",
    "    def step(self, learning_rate):\n",
    "        # update the weight\n",
    "        \n",
    "class my_exp(object):\n",
    "    # no parameter\n",
    "    def forward(self, x):\n",
    "        # return exp(x)\n",
    "            \n",
    "    def backward(self,grad):\n",
    "        # return the gradient backward\n",
    "            \n",
    "    def step(self, learning_rate):\n",
    "        # any parameter to update?\n",
    "        # Hint https://docs.python.org/3/reference/simple_stmts.html#the-pass-statement\n",
    "        "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "koymOfVlcfih"
   },
   "source": [
    "Now, you will need to compose sequentially these operations and here you need to code a class composing operations. This class will have a forward, a backward and a step method and also a compute_loss method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "EM5M8GCecfik"
   },
   "outputs": [],
   "source": [
    "class my_composition(object):\n",
    "    def __init__(self, layers):\n",
    "        # initialize with all the operations (called layers here!) in the right order...\n",
    "                \n",
    "    def forward(self, x):\n",
    "        # apply the forward method of each layer\n",
    "            \n",
    "    def compute_loss(self, y_est, y_target):\n",
    "        # use the L2 loss\n",
    "        # return the loss and save the gradient of the loss\n",
    "            \n",
    "    def backward(self):\n",
    "        # apply backprop sequentially, starting from the gradient of the loss\n",
    "        # Hint: https://docs.python.org/3/library/functions.html#reversed\n",
    "            \n",
    "    def step(self, learning_rate):\n",
    "        # apply the step method of each layer\n",
    "        "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "Hs1M-kebcfir"
   },
   "source": [
    "Now you need to code the 'training' loop. Keep track of the loss, weight and bias computed at each epoch."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "IWJgoCEKcfiu"
   },
   "outputs": [],
   "source": [
    "my_fit = my_composition([multiplication_weight(1),add_bias(1), my_exp()])\n",
    "learning_rate = 1e-4\n",
    "losses =[]\n",
    "ws = []\n",
    "bs = []\n",
    "for i in range(5000):\n",
    "    # take a random indice\n",
    "    j = np.random.randint(1, len(xx))\n",
    "    # you can compare with\n",
    "    #j = i % len(xx)\n",
    "    # compute the estimated value of y from xx[j] with the current values of the parameters\n",
    "    \n",
    "    # compute the loss and save it\n",
    "    \n",
    "    # update the parameters \n",
    "    \n",
    "    #and save them\n",
    "    ws.append(my_fit.layers[0].w)\n",
    "    bs.append(my_fit.layers[1].b)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "V7mp0eEEcfi2"
   },
   "outputs": [],
   "source": [
    "my_fit.layers[0].w"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "t54ayZNDcfi9"
   },
   "outputs": [],
   "source": [
    "my_fit.layers[1].b"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "4Gqz5CxzcfjG"
   },
   "outputs": [],
   "source": [
    "plt.plot(losses)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "kaHnaX8jcfjN"
   },
   "outputs": [],
   "source": [
    "plt.plot(bs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "ch28XtPUcfjV"
   },
   "outputs": [],
   "source": [
    "plt.plot(ws)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "Q4Qb9d65NFpo"
   },
   "source": [
    "Now you understand how Pytorch deals with automatic differentiation!!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "x9x7Vwc_MnWT"
   },
   "source": [
    "[![Dataflowr](https://raw.githubusercontent.com/dataflowr/website/master/_assets/dataflowr_logo.png)](https://dataflowr.github.io/website/)"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "include_colab_link": true,
   "name": "02_backprop.ipynb",
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
